Okay, so we're switching gears here.
We've done linear regression, which is finding a line that kind of models the behavior of
a point set.
What we want to do now is we want to do linear classification, and the idea is very, very
simple that if you have a couple of points, some of them are good, some of them are bad,
you want to be able to, as in this case, draw a line between them that separates them.
Any line actually gives us, separates, in this case, two space into two parts, the good
and the bad, and that makes predictions.
In this case, in this example, classic example, where you want to distinguish earthquakes
from underground nuclear explosions, and you basically, you look at the, these kind of
waves your seismometers are actually registering, and there's two kinds of waves, one is called
the surface wave, and the other one is called the body wave, and how big they are apparently
distinguishes, is different for explosions and for earthquakes, and in this case, you
want to have a separator here.
We'll call a set of examples linearly separable if there is such a separating line, or hyperplane,
and inseparable if there's no linear separator.
Okay, and again, we have two real values that give us the separator, or in this case, the
same thing.
We can classify the examples by being below the separator, or above the separator, so
if we have the separator given by two real numbers, then that x1, x2, given by these
weights is bigger than zero, gives us the positive, the negative examples in this case,
and lower than zero being the positive examples.
Essentially, what you want to do is you want to have something where you're taking this
space and transforming it into a space where the separator is actually the real line.
That's what this does here, and again, we can do exactly the same thing.
If we introduce a dummy coordinate x0 equals one, then we can write the whole thing as
dot products again, and that makes the whole thing slightly simpler.
Okay, so if you think about solving these, then the realization here is that you can
think of this as a threshold function, greater or smaller than zero is what you have.
If you want to minimize this thing here, you are minimizing a function here that t of this
W times x, dot product x, and it's essentially a step function.
This step function that we're minimizing has the problem that we're losing differentiability.
Essentially, the threshold function looks like this, derivative is zero here, derivative
is undefined here, and so on.
Don't look for any closed form solutions with high school methods.
It doesn't work.
What still does work is gradient descent.
This is actually an Arabic letter, curly Arabic one.
Really, calligraphic t is really what it is.
Should it be... Indeed, this... Come on, work.
This should be a one, yes.
That's the idea, thanks.
But still, even before this... No, actually, yeah.
We can't use any closed form solutions, but we can use the following update rule here
in gradient descent, which looks exactly like it was before.
That actually works.
If you think about it, we really have three possibilities here.
If y is the same as the hypothesis, we've correctly classified the example, then this
term here is zero, then we do nothing.
If y is one, and we've classified it as zero, then we want to make these w times x here
Presenters
Zugänglich über
Offener Zugang
Dauer
00:18:04 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-30 16:57:56
Sprache
en-US
Linear Classifiers with a hard Threshold, their learning curves and Logistic Regression are discussed.